Description Background & Context

The Thera bank recently saw a steep decline in the number of users of their credit card, credit cards are a good source of income for banks because of different kinds of fees charged by the banks like annual fees, balance transfer fees, and cash advance fees, late payment fees, foreign transaction fees, and others. Some fees are charged to every user irrespective of usage, while others are charged under specified circumstances.

Customers’ leaving credit cards services would lead bank to loss, so the bank wants to analyze the data of customers and identify the customers who will leave their credit card services and reason for same – so that bank could improve upon those areas

You as a Data scientist at Thera bank need to come up with a classification model that will help the bank improve its services so that customers do not renounce their credit cards

You need to identify the best possible model that will give the required performance

Objective

Explore and visualize the dataset. Build a classification model to predict if the customer is going to churn or not Optimize the model using appropriate techniques Generate a set of insights and recommendations that will help the bank Data Dictionary:

CLIENTNUM: Client number. Unique identifier for the customer holding the account Attrition_Flag: Internal event (customer activity) variable - if the account is closed then "Attrited Customer" else "Existing Customer" Customer_Age: Age in Years Gender: Gender of the account holder Dependent_count: Number of dependents Education_Level: Educational Qualification of the account holder - Graduate, High School, Unknown, Uneducated, College(refers to a college student), Post-Graduate, Doctorate. Marital_Status: Marital Status of the account holder Income_Category: Annual Income Category of the account holder Card_Category: Type of Card Months_on_book: Period of relationship with the bank Total_Relationship_Count: Total no. of products held by the customer Months_Inactive_12_mon: No. of months inactive in the last 12 months Contacts_Count_12_mon: No. of Contacts between the customer and bank in the last 12 months Credit_Limit: Credit Limit on the Credit Card Total_Revolving_Bal: The balance that carries over from one month to the next is the revolving balance Avg_Open_To_Buy: Open to Buy refers to the amount left on the credit card to use (Average of last 12 months) Total_Trans_Amt: Total Transaction Amount (Last 12 months) Total_Trans_Ct: Total Transaction Count (Last 12 months) Total_Ct_Chng_Q4_Q1: Ratio of the total transaction count in 4th quarter and the total transaction count in 1st quarter Total_Amt_Chng_Q4_Q1: Ratio of the total transaction amount in 4th quarter and the total transaction amount in 1st quarter Avg_Utilization_Ratio: Represents how much of the available credit the customer spent

Business Insights

There are a large number of missing values in:

 - Education_Level
 - Martial_Status

Will have to look into these attributues in addition to the Income_Category to see if we can impute the data or need to drop those rows

Data Initial observations

Fixing Education_Level, Marital_Status & Income_Categories

There is not a good correlation to impute a usable value instead of 'abc' will drop all rows with that value.

Between missing value treatments and the bad data for income we have cut out about 30% of our data set.

EDA


Reusable Functions

Bivariate Analysis

Data Preperation for Modeling

Evaluation Criteria: Accuracy / F1 Micro

We will be using accuracy as our core metric for this model. As the goal of this model is to evaluate who will leave their services. As the such it is more important to know we are identifing the right customers and it less important to optimize for recall or precision.

Reusable Functions

Training the Models

Over Sampling and Training the models

Under Sampling and Training the Models

Hyper Parameter Tunning

Our orginial models that were not over or under sampled seem to be performing the best. Will focus on 3 of those for tuning

Building the Pipeline